Goto

Collaborating Authors

 feed-forward network learn


[D] ReLU activated feed-forward network learns from back. Why? • r/MachineLearning

@machinelearnbot

I've been spending some time looking at the convergence behavior of different neural networks trained on MNIST data and cross-entropy loss. I started by training deeper and deeper networks using sigmoid type activations until the learning efficiency got too low before switching to ReLU activations. After switching to ReLU activations, my network converged without too many problems but I noticed that the learning rates exhibited an interesting pattern. In particular, it takes a complete epoch before the loss begins to fall. My weights and biases are initialized uniformly with weights initialized between -0.1 and 0.1.